智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

Xue Yu , Yifan Sun , Haijun Zhou

分类：机器学习

2022-11-28

High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest solution-guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

translated by 谷歌翻译

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Peidong Wang , Eric Sun , Jian Xue , Yu Wu , Long Zhou , Yashesh Gaur , Shujie Liu , Jinyu Li

分类：自然语言处理

2022-11-05

End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it easy to use a single model for both multilingual ASR and many-to-many ST. In this paper, we propose streaming language-agnostic multilingual speech recognition and translation using neural transducers (LAMASSU). To enable multilingual text generation in LAMASSU, we conduct a systematic comparison between specified and unified prediction and joint networks. We leverage a language-agnostic multilingual encoder that substantially outperforms shared encoders. To enhance LAMASSU, we propose to feed target LID to encoders. We also apply connectionist temporal classification regularization to transducer training. Experimental results show that LAMASSU not only drastically reduces the model size but also outperforms monolingual ASR and bilingual ST models.

translated by 谷歌翻译

Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization

Xue Yang , Gefan Zhang , Xiaojiang Yang , Yue Zhou , Wentao Wang , Jin Tang , Tao He , Junchi Yan

分类：计算机视觉 | 人工智能 | 机器学习

2022-09-22

现有检测方法通常使用参数化边界框（Bbox）进行建模和检测（水平）对象，并将其他旋转角参数用于旋转对象。我们认为，这种机制在建立有效的旋转检测回归损失方面具有根本的局限性，尤其是对于高精度检测而言，高精度检测（例如0.75）。取而代之的是，我们建议将旋转的对象建模为高斯分布。一个直接的优势是，我们关于两个高斯人之间距离的新回归损失，例如kullback-leibler Divergence（KLD）可以很好地对齐实际检测性能度量标准，这在现有方法中无法很好地解决。此外，两个瓶颈，即边界不连续性和正方形的问题也消失了。我们还提出了一种有效的基于高斯度量的标签分配策略，以进一步提高性能。有趣的是，通过在基于高斯的KLD损失下分析Bbox参数的梯度，我们表明这些参数通过可解释的物理意义进行了动态更新，这有助于解释我们方法的有效性，尤其是对于高精度检测。我们使用量身定制的算法设计将方法从2-D扩展到3-D，以处理标题估计，并在十二个公共数据集（2-D/3-D，空中/文本/脸部图像）上进行了各种基本检测器的实验结果。展示其优越性。

translated by 谷歌翻译

Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment

Xue Hu , Xinghui Li , Benjamin Busam , Yiren Zhou , Ales Leonardis , Shanxin Yuan

分类：计算机视觉

2022-08-05

对于视觉操作任务，我们旨在表示具有语义上有意义的功能的图像内容。但是，从图像中学习隐式表示通常缺乏解释性，尤其是当属性交织在一起时。我们专注于仅从2D图像数据中提取删除的3D属性的具有挑战性的任务。具体而言，我们专注于人类外观，并从RGB图像中学习穿着人类的隐性姿势，形状和服装表示。我们的方法学习了这三个图像属性的分解潜在表示的嵌入式，并通过2到3D编码器解码器结构可以有意义地重新组装特征和属性控制。 3D模型仅从学到的嵌入空间中的特征图推断出来。据我们所知，我们的方法是第一个解决这个高度不足的问题的跨域分解的方法。我们在定性和定量上证明了框架在虚拟数据上3D重建中转移姿势，形状和服装的能力，并显示隐性形状损失如何使模型恢复细粒度重建细节的能力有益。

translated by 谷歌翻译

End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

Jiachen Liu , Yuan Xue , Jose Duarte , Krishnendra Shekhawat , Zihan Zhou , Xiaolei Huang

分类：计算机视觉

2022-07-27

给定的用户输入的自动生成平面图在建筑设计中具有很大的潜力，最近在计算机视觉社区中探索了。但是，大多数现有方法以栅格化图像格式合成平面图，这些图像很难编辑或自定义。在本文中，我们旨在将平面图合成为1-D向量的序列，从而简化用户的互动和设计自定义。为了产生高保真矢量化的平面图，我们提出了一个新颖的两阶段框架，包括草稿阶段和多轮精炼阶段。在第一阶段，我们使用图形卷积网络（GCN）编码用户的房间连接图输入，然后应用自回归变压器网络以生成初始平面图。为了抛光最初的设计并生成更具视觉吸引力的平面图，我们进一步提出了一个由GCN和变压器网络组成的新颖的全景精炼网络（PRN）。 PRN将初始生成的序列作为输入，并完善了平面图设计，同时鼓励我们提出的几何损失来鼓励正确的房间连接。我们已经对现实世界平面图数据集进行了广泛的实验，结果表明，我们的方法在不同的设置和评估指标下实现了最先进的性能。

translated by 谷歌翻译

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

Yining Zhao , Chao Wen , Zhou Xue , Yue Gao

分类：计算机视觉

2022-07-19

在单个全景图像对3D房间布局的估计中，全局线框可以通过全局线框进行紧密描述。基于此观察，我们提出了一种替代方法，通过对可学习的霍夫变换块中的远程几何模式进行建模，以估算3D空间中的壁。我们将图像特征从库emap瓷砖转换为曼哈顿世界的霍夫空间，并将该功能直接映射到几何输出。卷积层不仅学习了局部梯度式的线特征，而且还利用全局信息成功预测具有简单网络结构的遮挡墙。与以前的大多数工作不同，预测是在每个Cubemap瓷砖上单独执行的，然后组装以获取布局估计。实验结果表明，我们在预测准确性和性能方面获得了可比的结果。代码可在https://github.com/starrah/dmh-net上找到。

translated by 谷歌翻译

Fine-grained Correlation Loss for Regression

Chaoyu Chen , Xin Yang , Ruobing Huang , Xindi Hu , Yankai Huang , Xiduo Lu , Xinrui Zhou , Mingyuan Luo , Yinyu Ye , Xue Shuang

分类：计算机视觉

2022-07-01

回归学习是经典的，是医学图像分析的基础。它为许多关键应用程序提供了连续的映射，例如属性估计，对象检测，分割和非刚性注册。但是，先前的研究主要以案例标准（如均方误差）为优化目标。他们忽略了非常重要的人口相关标准，这正是许多任务中的最终评估指标。在这项工作中，我们建议通过有关直接优化细粒相关损失的新型研究来重新审视经典回归任务。我们主要探索两个互补相关索引作为可学习的损失：Pearson线性相关（PLC）和Spearman等级相关性（SRC）。本文的贡献是两个折叠。首先，对于全球层面的PLC，我们提出了一项策略，以使其对异常值进行强大的态度并规范关键分布因素。这些努力显着稳定学习并扩大了PLC的功效。其次，对于本地级别的SRC，我们提出了一种粗到精细的方案，以减轻样品之间确切排名顺序的学习。具体而言，我们将样本排名的学习转换为样本之间相似关系的学习。我们在两个典型的超声图像回归任务上广泛验证了我们的方法，包括图像质量评估和生物措施测量。实验证明，通过直接优化相关性的细粒度指导，回归性能得到显着提高。我们提出的相关性损失是一般的，可以扩展到更重要的应用程序。

translated by 谷歌翻译

Chinese Word Sense Embedding with SememeWSD and Synonym Set

Yangxi Zhou , Junping Du , Zhe Xue , Ang Li , Zeli Guan

分类：自然语言处理

2022-06-29

单词嵌入是一项基本的自然语言处理任务，可以学习单词的特征。但是，大多数单词嵌入方法仅分配一个向量为一个单词，即使多序单词具有多声音。为了解决此限制，我们提出了SEMEMEWSD同义词（SWSD）模型，以在Open Hownet中的Word Sense Disampuation（WSD）（WSD）和同义词的帮助下为各种多词的矢量分配不同的向量。我们使用Sememewsd模型，这是一种基于Open Hownet的无监督的词义歧义模型，进行单词sense sense disammaguation并用sense id注释多义单词。然后，我们从Open Hownet获得了单词sense的十大同义词，并将同义词的平均向量作为sense sense的向量。在实验中，我们使用Gensim的WMDistance方法评估了有关语义相似性计算的SWSD模型。它可以提高准确性。我们还检查了不同BERT模型的Sememewsd模型，以找到更有效的模型。

translated by 谷歌翻译

TreeDRNet:A Robust Deep Model for Long Term Time Series Forecasting

Tian Zhou , Jianqing Zhu , Xue Wang , Ziqing Ma , Qingsong Wen , Liang Sun , Rong Jin

分类：机器学习

2022-06-24

各种深度学习模型，尤其是一些最新的基于变压器的方法，已大大改善了长期时间序列预测的最新性能。但是，这些基于变压器的模型遭受了严重的恶化性能，并延长了输入长度除了使用扩展的历史信息。此外，这些方法倾向于在长期预测中处理复杂的示例，并增加模型复杂性，这通常会导致计算的显着增加和性能较低的鲁棒性（例如，过度拟合）。我们提出了一种新型的神经网络架构，称为Treedrnet，以进行更有效的长期预测。受稳健回归的启发，我们引入了双重残差链接结构，以使预测更加稳健。对Kolmogorov-Arnold表示定理进行了明确的介绍，并明确介绍了特征选择，模型集合和树结构，以进一步利用扩展输入序列，从而提高了可靠的输入序列和Treedrnet的代表力。与以前的顺序预测工作的深层模型不同，Treedrnet完全建立在多层感知下，因此具有很高的计算效率。我们广泛的实证研究表明，Treedrnet比最先进的方法更有效，将预测错误降低了20％至40％。特别是，Treedrnet的效率比基于变压器的方法高10倍。该代码将很快发布。

translated by 谷歌翻译